-
Notifications
You must be signed in to change notification settings - Fork 129
WIP/Don't review: update split and splice docs #353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
9541618 to
ea8f353
Compare
ea8f353 to
d4064ca
Compare
| // 2. Concatenating the blob chunks in the order of the digest list returned | ||
| // by the server results in the original blob. | ||
| // | ||
| // A client should NOT expect that the original blob is available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // A client should NOT expect that the original blob is available. | |
| // A client SHOULD NOT expect that the original blob is available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I don't think this is true.
It's up to the implementation to decide whether to keep or discard the original blob after splitting. From the client perspective, nothing should change: calling ByteStream.Read or FindMissing over the original blob after splitting SHOULD still yield the original blob.
Keep in mind that we are assuming different types of clients are accessing the same remote cache. Some clients may support Split/Splice APIs, some may NOT. And it's important to remind the server implementations that both types MAY exist at the same time.
| // exists in the CAS, though this should be rare as manifests are typically | ||
| // created when blobs are written. The original blob digest does not need to | ||
| // be present in the CAS for this call to succeed if a manifest already exists. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // exists in the CAS, though this should be rare as manifests are typically | |
| // created when blobs are written. The original blob digest does not need to | |
| // be present in the CAS for this call to succeed if a manifest already exists. | |
| // exists in the CAS. |
| // exists in the CAS, though this should be rare as manifests are typically | |
| // created when blobs are written. The original blob digest does not need to | |
| // be present in the CAS for this call to succeed if a manifest already exists. | |
| // exists in the CAS. The original blob digest does not need to |
Leave the manifest creation as an implementation detail and avoid discussing it in the spec. That gives the server more freedom to pick the right tradeoffs (i.e. chunking eagerly vs lazily).
The server can keep both the chunked version and the non-chunked version of the same blob to serve both kinds of requests.
| // digests in the list. ZSTD-compressed chunks can be concatenated without | ||
| // decompression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ZSTD-compressed chunks can be concatenated without decompression.
I don't think this is correct.
There are a few ways compression can be applied here:
a. Compress the original large blob before chunking do not compress chunks
b. Compress the original large blob, chunk, then compress the chunks additionally
c. Chunk the uncompressed large blob, then compress the chunks individually
From the previous discussions, I think we do not care about (a) and (b) and will mostly be doing (c). Theoretically, (c) should give us a better chunk "hit rate" when using a consistent chunking algorithm (i.e. FastCDC) across different versions of the same large blob, as the compression algorithm can shuffle contents around.
Under (c), each chunk will be compressed individually. To get back the original large blob, one would need to decompress each chunk before concatenating them based on the order in the response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we do not care about (a) and (b) and will mostly be doing (c).
Yes, agree
To get back the original large blob, one would need to decompress each chunk before concatenating them based on the order in the response.
One nice property of ZSTD is you can concatenate first and then decompress. This is because with ZSTD: Decompress(Compress(A1) + Compress(A2)) == A1 + A2
So as long as we do (c) for creation of the chunks, the user can get the full file compressed by concatenating the compressed chunks.
I think this is probably also not important to the spec, and I should remove it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spec has DEFLATE and BROTLI as well, which may not share that property.
| // which chunks are locally available and just fetch the missing ones. The | ||
| // desired blob can be assembled by concatenating the fetched chunks in the | ||
| // order of the digests in the list. | ||
| // Split retrieves information about how a blob is split into chunks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compared to the original doc, this new version missed the part where the chunks MUST/SHOULD be stored in CAS. It's something obvious to an experienced maintainer, but should be documented to help newer implementations onboard the spec.
| // 2. Concatenating the blob chunks in the order of the digest list returned | ||
| // by the server results in the original blob. | ||
| // | ||
| // A client should NOT expect that the original blob is available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I don't think this is true.
It's up to the implementation to decide whether to keep or discard the original blob after splitting. From the client perspective, nothing should change: calling ByteStream.Read or FindMissing over the original blob after splitting SHOULD still yield the original blob.
Keep in mind that we are assuming different types of clients are accessing the same remote cache. Some clients may support Split/Splice APIs, some may NOT. And it's important to remind the server implementations that both types MAY exist at the same time.
| // When uploading a large blob using chunked upload, clients MUST first upload | ||
| // all chunks to the CAS, then call this RPC to store a manifest that describes | ||
| // how those chunks compose the original blob. The chunks referenced in the | ||
| // manifest must be available in the CAS before calling this RPC. The original | ||
| // blob does not need to be available: the correctness of the manifest can be | ||
| // validated from manifest correctness and by verifying that the chunks match | ||
| // their specified digests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // When uploading a large blob using chunked upload, clients MUST first upload | |
| // all chunks to the CAS, then call this RPC to store a manifest that describes | |
| // how those chunks compose the original blob. The chunks referenced in the | |
| // manifest must be available in the CAS before calling this RPC. The original | |
| // blob does not need to be available: the correctness of the manifest can be | |
| // validated from manifest correctness and by verifying that the chunks match | |
| // their specified digests. | |
| // When uploading a large blob using chunked upload, clients MUST first upload | |
| // all chunks to the CAS, then call this RPC to store a manifest that describes | |
| // how those chunks compose the original blob. The chunks referenced in the | |
| // manifest SHOULD be available in the CAS before calling this RPC. The original | |
| // blob does not need to be available: the correctness of the manifest can be | |
| // validated from manifest correctness and by verifying that the chunks match | |
| // their specified digests. |
s/must/SHOULD/. We don't have a transactional guarantee between uploading chunks and calling Splice API. So a remote cache with a very short TTL can evict some of the chunks by the time the client finishes uploading and calls SpliceBlob. In those cases, the client should expect an error response from the server.
The original blob does not need to be available:
the correctness of the manifest can be validated from manifest correctness
Huh? Might need some rewording here
and by verifying that the chunks match their specified digests.
Keep in mind that SpliceBlobRequest.blob_digest is an optional field.
The client can upload a bunch of chunks and Splice, then use SpliceBlobResponse.blob_digest to continue the build. That does imply that the client "trusts" the server's splicing result, but it will save the client from having to hash the large blob itself.
| // trust digests provided by the client. The server MUST validate the manifest | ||
| // and verify that all referenced chunks exist in the CAS and match their | ||
| // specified digests. The server MAY optionally verify that concatenating the | ||
| // chunks results in a blob matching the original blob digest, particularly if | ||
| // the client is not trusted. The server MAY accept a request as no-op if a | ||
| // manifest for the client-specified blob already exists; the lifetime of that | ||
| // manifest and its chunks SHOULD be extended as usual. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The server MUST validate the manifest
and verify that all referenced chunks exist in the CAS and match their
specified digests.
This seems to assume that the large blob exists only in the manifest + chunk form, not in its original form. In reality, servers may choose to store both. So this is an implementation detail our spec should avoid taking an opinion on.
The server MAY optionally verify that concatenating the
chunks results in a blob matching the original blob digest, particularly if
the client is not trusted.
We already established a few sentences earlier that "the server MUST NOT trust ... the client", so the "if the client is not trusted" part should be implied? This sentence feels unnecessary to me.
The server MAY accept a request as no-op if a
manifest for the client-specified blob already exists; the lifetime of that
manifest and its chunks SHOULD be extended as usual.
I know this is from the original text, but I wonder if this should be updated.
In particular, let's say I were to send this splice request to the server.
{
"blob_digest": "<big-blob-does-exist-in-CAS>",
"chunk_digests": [] // empty - an invalid list of chunks
}
What the current text is advising: if blob_digest already exists, then the server MAY accept this result. It also implies that if the large blob exists in the manifest + chunks form, the manifest could be updated to the latest list of chunks. This seems wrong.
On one hand, I do think it's nice to have a short circuit way to skip having to verify the concatenated result. But it might be worth reminding the server implementations that the server should never update the underlying manifest without verifying the concatenated result. This would help us disambiguate cases where the clients want to update the chunking result to an improved chunking algorithm / configuration versus cases where malicious actors tried to replace an existing chunk manifest with a malicious one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, now that I think about this a little bit more: I think that the server should never let SpliceBlob update an existing large blob manifest.
It's ok to create a new one. In that case, the server MUST verify the concat result.
But never update existing manifests (bumping TTL should be ok).
This PR better aligns the language of the REv2 API to describe how a client should expect to interact with the
SplitandSpliceAPIs.Generally speaking, when designing a Remote Cache service, the server is not always primarily responsible for doing splitting and splicing blobs. In fact, the
SplitandSpliceAPIs are extremely helpful from a client's context to store and retrieve this manifest for how content defined chunking can compose a blob.For example, if a client calls
Splicethat maps blob digestAto chunksA1andA2, this instructs the server to store this information. Later, if a client that is not chunking-aware callsReadonA, the server can use this stored state to composeAfromA1andA2stored in the CAS, and serve it to the client.Similarly, if a user calls
Spliton blobB(which could be some Action Result), the server would respond with its stored manifest:B1andB2. A chunking aware client can then skip downloadingB1if it's available locally from some other file's chunks, and download onlyB2without ever needing to download the entirety ofB.